S&ZFH031001 

Compatible multi-channel coding/ decoding 

Field of the invention 

5 

The present invention relates to an apparatus and a method 
for processing a multi -channel audio signal and, in par- 
ticular, to an apparatus and a method for processing a 
multi-channel audio signal in a stereo-compatible manner. 

10 

Background of the Invention and Prior Art 

In recent times, the multi-channel audio reproduction tech- 
15 nique is becoming more and more important. This may be due 
to the fact that audio compression/encoding techniques such 
as the well-known mp3 technique have made it possible to 
distribute audio records via the Internet or other trans- 
mission channels having a limited bandwidth. The mp3 coding 
20 technique has become so famous because of the fact that it 
allows distribution of all the records in a stereo format, 
i.e., a digital representation of the audio record includ- 
ing a first or left stereo channel and a second or right 
stereo channel. 

25 

Nevertheless, there are basic shortcomings of conventional 
two-channel sound systems. Therefore, the surround tech- 
nique has been developed. A recommended multi-channel- 
surround representation includes, in addition to the two 
30 stereo channels L and R, an additional center channel C and 
two surround channels Ls, Rs . This reference sound format 
is also referred to as three/two-stereo, which means three 
front channels and two surround channels. Generally, five 
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transmission channels are required. In a playback environ- 
ment, at least five speakers at the respective five differ- 
ent places are needed to get an optimum sweet spot in a 
certain distance from the five well-placed loudspeakers. 

Several techniques are known in the art for reducing the 
amount of data required for transmission of a multi-channel 
audio signal. Such techniques are called joint stereo tech- 
niques. To this end, reference is made to Fig. 10, which 
shows a joint stereo device 60. This device can be a device 
implementing e.g. intensity stereo (IS) or binaural cue 
coding (BCC) . Such a device generally receives - as an in- 
put - at least two channels (CHI, CH2, ... CHn) , and outputs 
a single carrier channel and parametric data. The paramet- 
ric data are defined such that, in a decoder, an approxima- 
tion of an original channel (CHI, CH2, ... CHn) can be calcu- 
lated. 

Normally, the carrier channel will include subband samples, 
spectral coefficients, time domain samples etc, which pro- 
vide a comparatively fine representation of the underlying 
signal, while the parametric data do not include such sam- 
ples of spectral coefficients but include control parame- 
ters for controlling a certain reconstruction algorithm 
such as weighting by multiplication, time shifting, fre- 
quency shifting, ... The parametric data, therefore, include 
only a comparatively coarse representation of the signal or 
the associated channel. Stated in numbers, the amount of 
data required by a carrier channel will be in the range of 
60 - 70 kbit/s, while the amount of data required by para- 
metric side information for one channel will be in the 
range of 1,5 - 2,5 kbit/s. An example for parametric data 
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are the well-known scale factors, intensity stereo informa- 
tion or binaural cue parameters as will be described below. 

Intensity stereo coding is described in AES preprint 3799, 
5 "Intensity Stereo Coding", J. Herre, K. H. Brandenburg, D. 
Lederer, February 1994, Amsterdam. Generally, the concept 
of intensity stereo is based on a main axis transform to be 
applied to the data of both stereophonic audio channels. If 
most of the data points are concentrated around the first 
10 principle axis, a coding gain can be achieved by rotating 
both signals by a certain angle prior to coding. This is, 
however, not always true for real stereophonic production 
techniques. Therefore, this technique is modified by ex- 
cluding the second orthogonal component from transmission 
15 in the bit stream. Thus, the reconstructed signals for the 
left and right channels consist of differently weighted or 
scaled versions of the same transmitted signal. Neverthe- 
less, the reconstructed signals differ in their amplitude 
but are identical regarding their phase information. The 
20 energy-time envelopes of both original audio channels, how- 
ever, are preserved by means of the selective scaling op- 
eration, which typically operates in a frequency selective 
manner. This conforms to the human perception of sound at 
high frequencies, where the dominant spatial cues are de- 
25 termined by the energy envelopes. 

Additionally, in practically implementations, the transmit- 
ted signal, i.e. the carrier channel is generated from the 
sum signal of the left channel and the right channel in- 
30 stead of rotating both components. Furthermore, this proc- 
essing, i.e., generating intensity stereo parameters for 
performing the scaling operation, is performed frequency 
selective, i.e., independently for each scale factor band, 
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i.e. 



encoder frequency partition. Preferably, both chan- 
nels are combined to form a combined or "carrier" channel, 
and, in addition to the combined channel, the intensity 
stereo information is determined which depend on the energy 
5 of the first channel, the energy of the second channel or 
the energy of the combined or channel. 

The BCC technique is described in AES convention paper 
5574, "Binaural cue coding applied to stereo and multi- 
10 channel audio compression", C. Faller, F. Baumgarte, May 
2002, Munich. In BCC encoding, a number of audio input 
channels are converted to a spectral representation using a 
DFT based transform with overlapping windows. The resulting 
uniform spectrum is divided into non-overlapping partitions 
15 each having an index. Each partition has a bandwidth pro- 
portional to the equivalent rectangular bandwidth (ERB) . 
The inter-channel level differences (ICLD) and the inter- 
channel time differences (ICTD) are estimated for each par- 
tition for each frame k. The ICLD and ICTD are quantized 
20 and coded resulting in a BCC bit stream. The inter-channel 
level differences and inter-channel time differences are 
given for each channel relative to a reference channel. 
Then, the parameters are calculated in accordance with pre- 
scribed formulae, which depend on the certain partitions of 
25 the signal to be processed. 

At a decoder-side, the decoder receives a mono signal and 
the BCC bit stream. The mono signal is transformed into the 
frequency domain and input into a spatial synthesis block, 
30 which also receives decoded ICLD and ICTD values. In the 
spatial synthesis block, the BCC parameters (ICLD and ICTD) 
values are used to perform a weighting operation of the 
mono signal in order to synthesize the multi-channel sig- 



nals, which, after a frequency/time conversion, represent a 
reconstruction of the original multi-channel audio signal. 



In case of BCC, the joint stereo module 60 is operative to 
output the channel side information such that the paramet- 
ric channel data are quantized and encoded ICLD or ICTD pa- 
rameters, wherein one of the original channels is used as 
the reference channel for coding the channel side informa- 
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Normally, the carrier channel is formed of the sum of the 
participating original channels. 

Naturally, the above techniques only provide a mono repre- 
sentation for a decoder, which can only process the carrier 
channel, but is not able to process the parametric data for 
generating one or more approximations of more than one in- 
put channel. 

To transmit the five channels in a compatible way, i.e., in 
a bitstream format, which is also understandable for a nor- 
mal stereo decoder, the so-called matrixing technique has 
been used as described in "MUSICAM surround: a universal 
multi-channel coding system compatible with ISO 11172-3", 
G. Theile and G. Stoll, AES preprint 3403, October 1992, 
San Francisco. The five input channels L, R, C, Ls, and Rs 
are fed into a matrixing device performing a matrixing op- 
eration to calculate the basic or compatible stereo chan- 
nels Lo, Ro, from the five input channels. In particular, 
these basic stereo channels Lo/Ro are calculated as set out 
below : 



Lo = L + xC + yLs 
Ro = R + xC + yRs 



x and y are constants. The other three channels C, Ls, Rs 
are transmitted as they are in an extension layer, in addi- 
tion to a basic stereo layer, which includes an encoded 
version of the basic stereo signals Lo/Ro. With respect to 
the bitstream, this Lo/Ro basic stereo layer includes a 
header, information such as scale factors and subband sam- 
ples. The multi-channel extension layer, i.e., the central 
channel and the two surround channels are included in the 
multi-channel extension field, which is also called ancil- 
lary data field. 

At a decoder-side, an inverse matrixing operation is per- 
formed in order to form reconstructions of the left and 
right channels in the five-channel representation using the 
basic stereo channels Lo, Ro and the three additional chan- 
nels. Additionally, the three additional channels are de- 
coded from the ancillary information in order to obtain a 
decoded five-channel or surround representation of the 
original multi-channel audio signal. 

Another approach for multi-channel encoding is described in 
the publication "Improved MPEG-2 audio multi-channel encod- 
ing", B. Grill, J- Herre, K. H. Brandenburg, E. Eberlein, 
J. Roller, J. Mueller, AES preprint 3865, February 1994, 
Amsterdam, in which, in order to obtain backward compati- 
bility, backward compatible modes are considered. To this 
end, a compatibility matrix is used to obtain two so-called 
downmix channels Lc, Rc from the original five input chan- 
nels. Furthermore, it is possible to dynamically select 
the three auxiliary channels transmitted as ancillary data. 
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In order to exploit stereo irrelevancy, a joint stereo 
technique is applied to groups of channels, e. g. the three 
front channels, i.e., for the left channel, the right chan- 
5 nel and the center channel. To this end, these three chan- 
nels are combined to obtain a combined channel. This com- 
bined channel is quantized and packed into the bitstream. 
Then, this combined channel together with the corresponding 
joint stereo information is input into a joint stereo de- 
10 coding module to obtain joint stereo decoded channels, 
i.e., a joint stereo decoded left channel, a joint stereo 
decoded right channel and a joint stereo decoded center 
channel. These joint stereo decoded channels are, together 
with the left surround channel and the right surround chan- 
15 nel input into a compatibility matrix block to form the 
first and the second downmix channels Lc, Rc . Then, quan- 
tized versions of both downmix channels and a quantized 
version of the combined channel are packed into the bit- 
stream together with joint stereo coding parameters. 

20 

Using intensity stereo coding, therefore, a group of inde- 
pendent original channel signals is transmitted within a 
single portion of "carrier" data. The decoder then recon- 
structs the involved signals as identical data, which are 

25 rescaled according to their original energy-time envelopes. 
Consequently, a linear combination of the transmitted chan- 
nels will lead to results, which are quite different from 
the original downmix. This applies to any kind of joint 
stereo coding based on the intensity stereo concept. For a 

30 coding system providing compatible downmix channels, there 
is a direct consequence: The reconstruction by dematrixing, 
as described in the previous publication, suffers from ar- 
tifacts caused by the imperfect reconstruction. Using a so- 



called joint stereo predistortion scheme, in which a joint 
stereo coding of the left, the right and the center chan- 
nels is performed before matrixing in the encoder, allevi- 
ates this problem. In this way, the dematrixing scheme for 
reconstruction introduces fewer artifacts, since, on the 
encoder-side, the joint stereo decoded signals have been 
used for generating the downmix channels. Thus, the imper- 
fect reconstruction process is shifted into the compatible 
downmix channels Lc and Rc, where it is much more likely to 
be masked by the audio signal itself. 

Although such a system has resulted in fewer artifacts be- 
cause of dematrixing on the decoder-side, it nevertheless 
has some drawbacks. A drawback is that the stereo- 
compatible downmix channels Lc and Rc are derived not from 
the original channels but from intensity stereo 
coded/decoded versions of the original channels. Therefore, 
data losses because of the intensity stereo coding system 
are included in the compatible downmix channels. Astereo- 
only decoder, which only decodes the compatible channels 
rather than the enhancement intensity stereo encoded chan- 
nels, therefore, provides an output signal, which is af- 
fected by intensity stereo induced data losses. 

Additionally, a full additional channel has to be transmit- 
ted besides the two downmix channels. This channel is the 
combined channel, which is formed by means of joint stereo 
coding of the left channel, the right channel and the cen- 
ter channel. Additionally, the intensity stereo information 
to reconstruct the original channels L, R, C from the com- 
bined channel also has to be transmitted to the decoder. At 
the decoder, an inverse matrixing, i.e., a dematrixing op- 
eration is performed to derive the surround channels from 
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the two downmix channels. Additionally, the original left, 
right and center channels are approximated by joint stereo 
decoding using the transmitted combined channel and the 
transmitted joint stereo parameters. It is to be noted that 
5 the original left, right and center channels are derived by 
joint stereo decoding of the combined channel. 
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Summary of the Invention 

It is the object of the present invention to provide a con- 
cept for a bit-efficient and artifact-reduced processing or 
inverse processing of a multi-channel audio signal. 

In accordance with a first aspect of the present invention, 
this object is achieved by an apparatus for processing a 
multi-channel audio signal, the multi-channel audio signal 
having at least three original channels, comprising: means 
for providing a first downmix channel and a second downmix 
channel, the first and the second downmix channels being 
derived from the original channels; means for calculating 
channel side information for a selected original channel of 
the original signals, the means for calculating being op- 
erative to calculate the channel side information such that 
a downmix channel or a combined downmix channel including 
the first and the second downmix channel, when weighted us- 
ing the channel side information, results in an approxima- 
tion of the selected original channel; and means for gener- 
ating output data, the output data including the channel 
side information, the first downmix channel or a signal de- 
rived from the first downmix channel and the second downmix 
channel or a signal derived from the second downmix chan- 
nel . 
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in accordance with a second aspect of the present inven- 
tion, this object is achieved by a Method o£ processing a 
m ulti-ohannel audio signal, the multi-channel audio signal 
5 having at least three original channels, comprising: pro- 
viding a first downmix channel and a second downmix chan- 
nel the first and the second downmix channels being de- 
rived from the original channels; calculating channel side 
information for a selected original channel of the original 
10 signals such that a downmix channel or a combined downmix 
channel including the first and the second downmix channel, 
when weighted using the channel side information, results 
in an approximation of the selected original channel; and 
generating output data, the output data including the chan- 
15 nel side information, the first downmix channel or a signal 
derived from the first downmix channel and the second down- 
m ix channel or a signal derived from the second downmix 
channel . 

20 in accordance with a third aspect of the present invention, 
this object is achieved by an apparatus for inverse proc- 
essing of input data, the input data including channel side 
information, a first downmix channel or a signal derived 
from the first downmix channel and a second downmix channel 
25 or a signal derived from the second downmix channel, 
wherein the first downmix channel and the second downmix 
channel are derived from at least three original channels 
of a multi-channel audio signal, and wherein the channel 
side information are calculated such that a downmix channel 
30 or a combined downmix channel including the first downmix 
channel and the second downmix channel, when weighted using 
the channel side information, results in an approximation 
of the selected original channel, the apparatus comprising: 
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a n input data reader for reading the input data to 
he Let downmix o h annei or a signal derived 

£irst downmix channei and the second — ohannei or 

sig nal derived fro, the seoond downmix channei and the 
annei side information; and a ohannei reconstructed 

cnannex . n nf the selected original 

reconstructing the approximation of the 

annei using the ohannei side information and the downmr* 
channei or the coined downmrx ohannei to obtarn the a P 
proximation of the selected original channel. 

I„ accordance with a fourth aspect of the present inven- 
on, this oh:eot is achieved by a method of 
ssing of input data, the input data including channel srde 
orlation, a first downmix ohannei or a signal derrv 

or a signal derived from the seoond do«n m rx channel, 
erein the first downmix channel and the second downmrx 
channel are derived fro, at least three original channel 
a multi-channel audio signal, and wherein the channe 
0 sl de information are calculated such that a downmrx channel 
a coined downmix channel including the first down, 
channel and the second downmix channel, when werghted 

he channel srde information, results in an approxrmat 
of the selected original ohannei, the method comprrsmg 
ading the input data to obtain the first downmix c ann 
or a signal derived from the first downmix channel and the 

cl Lnmix channel or a signal derived from 
downmix channel and the channel side inf ormatron; and re 
constructing the approximation of the selected orrgrnal 
channel using the channel side information and the downmrx 
channel or the combined downmix channel to obtarn the ap- 
proximation of the selected original channel. 



5 



30 



12 



In accordance with a fifth aspect and a sixth aspect of the 
e invention, this object is achieved by a computet 

IZl, including the method of processing or the method of 
inverse processing. 

Th e present invention is based on the finding that an affi- 
ant and artifact-reduced encoding of multi-channel audro 
I rai ls obtained, .hen two downmix channels prefer^ 
representing the left and right stereo channels, are packed 
into output data. 

Inv entively, Parametric channel side info ration for one or 

mo re of the original channels are derxved 

llate to one of the downmix channels rather than as 

— r:r. r - 

0 the downmi* channels or a combination of the downer 

1 annels to reconstruct an approximation of the orrgrnal 
audio channel, to whrch the channel side informatron rs as- 
signed. 

The inventive concept is advantageous in that it provides a 
, hit-efficient multi-channel extension such that a multr- 
channel audio signal can be played at a decoder. 

.dditionally, the inventive concept is 

slnce a lower scale decoder, which is only adapted for two 
0 channel processing, can simply ignore the 

matlon i.e., the channel side information. The lower scale 
d d can only play the two downmix channels to obtarn a 
tree representation of the original multi-channel aud.o 
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■ , a hiaher scale decoder, hcwever, which is enabled 
rri 'lh -chill operation, can use t he 

Zl side information to reconstruct approbations of the 
original channels. 

The present invention is advanta g eous in that it is bit- 

• * Tr^t-Pad the channel side in- 
i Rr is required. Insteaa, 

rr:rtron L :;e R :elated q to one or both downmix channels .his 

means that the downmix channels themselves serve as a 

: channel, to which the channel side -Ration are 



i.e., 



20 



25 



30 



^ :r s 2T:;Zs or spectral coefficients 

:::: — are ) in r::r P ic u t s i 

for wei g htin 9 «in time and/or --uency, the e 
downmix channel or the combination of the respect 
mix channels to obtain a reconstructed versron of 
lected original channel. 

matrixing of the original channels 
signal. 

mventively, channel srde information for a selected ori g i- 
Tchanne is obtained based on Joint stereo technics 
h Tint-.it, stereo codin, or binaural cue codrnd 
us at the decoder side, no dematrrxin, operation has to 



Thus 
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be perfor.ec, The problems associate, with 

.e , certain artifacts relate, to an 
tl on of guantization noise in dematrixing operates are 
H d This is due to the fact that the decoder uses a 
I: e 'r srructcr, which reconstructs an original sig- 
b u ing one of the .ownmix channels or a combinatron 
TJ doW nmix channels and the transmitted channel srde 
information. 

Preferably, the inventive concept is applied to a multi- 
all audio signal havrng five channels. These 
1, are a left channel L, a right channel R, a cente 
I nelC a left surround channel Ts, and a right surround 
channel Ks. Preferably, downmix channels are stereo co 
patible downmix channels Ls and Rs, which provrde a stereo 
Ilpresentation of the original multi-channel audio srgnal. 

In accordance with the preferred embodiment of the present 

data . Channel side information for the orrgrnal eft ch 
».l are derived using the left downmix channel. Channel 
nel are aei. surround channel are 

side information for the original left surrou 

H ina the left downmix channel. Channel side infer- 
derived using tne xen ^ ^^^ m t- hp 

• — - - tt ~..r::: r,r;.:: z 

riaht downmix channel. Channel si 

9 , v, Q i ar p derived from the right 

original right surround channel are derive 

downmix channel. 

•t-h the preferred embodiment of the present 
]Q m accordance with the prefer 

invention, channel information for the g 

channel are derived using the first downmix channel as well 

as the second downmix channel, i.e., using a combination of 
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i preferably, this combination is a 
the two downmix channels. Preferaoxy, 

summation. 

. ns , e the relation between the channel 

Thus, the groupings, i.e., tn ^ 

sid e information an d the carrier signal ^ 

d ownmix channel for providing ~ 1 fM optimum 

. selected original channel are such that, ^ 

Ttv a certain downmix channel is selecteo, 
quality, a certa ^ ^ respec . 

tains the highest possible relati ented by 

tive original multi-channel signal which is P 

— of -rr«rrx- sir — — 

carrier signal, the the 
are us6 d. Preferably also t e « - ^ _ ^ 

SeC °; d rrsIcoTd w Iirchannels can be used for cal- 
5 the first and secona original 
culating channel side information for « *o ^ 
i= Preferably, however, tne sum 
IHs used or c iculating the channel side information 
: th o i nal center channel in a surround environment 
l0 such as five channel surround, seven channel surroun ,1 

second downmix channels is espec y formed . 

national transmission overhead has to be per 
no additional t are pre . 

"I 1 The d coo such that summing of these downmix 
" llllnels ^ - performed at the decoder without re- 

quiring any additional transmission bits. 

Preferably, the channel side ^ti- - ^ 

30 channel extension are input into the ou d 
in a compatible way such that a lower sea e 
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Ne vertheless, a hrgher scale encoder no on ^ 
do wnmix channels, but, in addition, 

5ide intonation to reconstruct a full multr-channel repre 
sentation of the original audio signal. 

u- • ~ ^ f i rstlv decode both 
An inventive decoder is operative to fxrstly 

, nd to read the channel side information 

r:;: — 

3 struct approximations of the origin 

end , preferably no dematrixing operatron at all is pe 
fo rmed This means tbat, in this embodiment, each of th e 

five original input channels are reconstructed 
!' ive sets of different channel side information. In the 
5 ec , be same grouping as in the encoder is perform 
5 calculating the reconstructed channel approxim a ion 

a £i ve-channel surround environment, this means tbat ^ 
reconstructing the original left channel, th ^ left 
channel and the channel side informatron for 

a To reconstruct the original rrght channel, 

20 — i— - - — - 

r thl right channel are used. To reconstruct the origi 

^ bhP left downmix channel and rne 
left surround channel, the lerr 

channel side information for the left surround channel are 
25 used To reconstruct the original right surround channe 
I e channel side information for the right surround channe 
and the right downmix channel are used. To reconstruct be 
r g 1 center channel, a combined channel formed from the 

30 the center channel side information are used. 

•voo to replay the first and 

-, n +■ -i q alSO DOSSlble, LO J-epxajy 
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* - n five) of channel side 
that only three sets (out of e. g. 
formation parameters have to he 

however, only advisahle in situations wher t re at 

— t ruies rr r h iTe;r— i::; - the 

5 fact that, normally, the iext 

n ^ffprent from the original left cnan 

d r:T:^^^- *** * — 

" ot afford to transmit channel side informa- 

wh ere one can not afford proceS sin g is 

tion for each of the original channels, 
10 advantageous. 

BrieXJ^esi^pt^ 

* nrpsent invention are subse- 

» — d emb0di r ntS th re e en the attache, fibres, 

quently discussed with reference 



in which: 
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4= a nrpferred embodiment of 
Fig. 1 is a block diagram of a preferrea 

the inventive encoder; 

f » nrpferred embodiment of 
Fig. 2 is a block diagram of a preferre 

the inventive decoder; 

a nrpferred implementation 
■a a is a block diagram for a prererieu 
25 Fl9 - I, the means for calculate to ohtain fre q ueno y 

selective channel side information; 

F - a 3B is a preferred embodiment of a calculator imple- 
30 ' Until, Joint stereo processin, such as intensity 

coding or binaural cue coding; 
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Fig, 



Fig. 



10 Fig. 6 



Fig. 1 



Fig. 8 



+ nreferred embodiment of the 
illustrates another preierieu 

:::: for « ^ 

in „ hi ch the channel side information are gaxn 
factors; 

1Uustrates a prefer e— t o £ an ittple- 
me ntation of the decoder, when the encoder rs 
plemented as in Fig. 4; 

Ulu.tr.te. a preferred Implementation o£ the 
m eans for providing the downmix channels; 

Ulustrates grouping of original and downmix 
channels for calculating the channel srde rnfor- 
mat ion for the respective original channels, 

+ u~r- nreferred embodiment of an 
illustrates another preterre 

inventive encoder; 

Ulustrates another implementation of an inven- 
tive decoder; and 



20 Fig. 

Fig . 10 illustrates a prior art Joint stereo encoder 
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such as R, L and channels in the 

has more than « ^^rated " ^ ^ 

surround envrronment, which right channel R, 

flve channels are the left channel L, the rrg 
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v, , c the left surround channel Ls and the 
the center channel C the ^ 
righ t surround channel ^ The ^ g 

1? for providing a rirsu u 

iZ roL p j oo^r^nd downmix 



means 



ana 12 for proviorny mix 
second downmix channel Re, the frrst ^ ^ 
channels being -^^^0^ channels, 
rlv i„g the downmrx f ^ possibllity Is to 

the re exist several poss ^ ^ q£ matrlxing 

derive the downmix channels Lc and Y 

The ordinal channels usin, a matrlxing opera r^ l- 
t rated in Fi,. 6. This matrlxing operatron pe 
the time domain. 

a h and t are selected such that 
The matrlxing parameters a b - ^ ^ fe _ 

they are lower than or equal prefe rably 
5 o 7 or 0.5. The overall weights parameter t P 

chosen such that channel clipping " ■ ^ 

4.- «iw *s it is indicated m tig- 
Al ternatively, as it externally supplied. This 

channels Lc and Rc can also 

when the downmix channels Lc and Rc 
may be done, when the scenario, a 

» — - *.:::.: r;— . — . — 

sound engineer mixes tne , The SOU nd 

than b y asm, an automated mat™ ^ ^ ^ 

„ _ ........ - r - - — r;r;.r. 

30 nels to a subsequent calculating means 14. 

The calculating means 1, is operative to calculate^ the 
channel side information such as U, Li, * 
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T Ts R or Rs, respec- 
lected ordinal channels such 14' for' calculating is op- 

eratlve to calculate the channel sr 

a d own m iK channel, when weighted usrng the channe 

nation, results in an approxr»atron 



original channel. 
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20 



25 



30 



3U lts in an approximate of the sel ^ 
To show this feature in ^ ^Z^ll ^ ^ 
olned channel side informatron calculator 

„ is clear £or those s,illea 

m ents do not have to be *^" d " ^ „, 14a , an d 
Ins tead, the whole functionalit y of r 
14b can he implemented by means 0* ^ 
which may be a general purpose 
for performs the required f unctronalrty . 

„ lt is to be noted here that channel signals 
Additionally, it rs Hnmain values are indi- 

being subband samples or « * » _ in 

cated in capital letters. Channel s u 
contrast to the channels themselves, rndrcat^ ^ 
letters. The channel side informatr on c rs ^ 
channel side information for the orrgrnal 

«n as the downmix chan- 
Th e channel side information as well as th 
nels Lc and Rc or an encoded versron Lc 
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15 



duced by an audro encoder 1« are input » »^ ^ 

f0 rmatter ». Cenerally, the »^ ^ data ln _ 

as means £or generating output data the P 

original channel, the first downmr* or a .j 

, * <-h<» first downmix channel (sucn 
rlv ed from the f " St dQWnmlx cnanne l or a signal 

version thereof) and the 

derived from the second downmix channel (such 
version thereof) . 

on ^=>n then be trans- 
Th e output data or output oitstream « - «» 
mi tted to a oitstream decoder or can he stored 

- — ^ 

bitstream which can also raDa bility. Such lower 

mlllH -channel extension capability- 
not havmg a multr chan ^ o£ ^ 

scale encoders such as most multi -channel ex- 

pression is obtained. 

' . „ , a preferred embodiment of the present inven- 

ng . 8 shows a prefer ; 

^Ts "d t: write the surround enhancement 
■ to the illary data field in the standardly m P 3 
data rnto the V surro und" bit stream rs 

,0 bit stream syntax such that an mp 

obtained. 
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Fig 2 shows an illustration o£ an inventive decoder acting 
'an apparatus to, inverse processing ^TZZ^ 
4- 99 The data received at tne xnpu<_ 

zj\r*»tr.r if. . -«.«-- 

the data received at data input port 

from the original data produced by the encoder. 

■ t data are input into a data stream reader 
The decoder input data are i P ^ 

24 for reading the input data to y 

is »nri the left downmix channel 28 and tne 
side information 26 and tne ieii- ,„j„ 
i ,n Tn case the input data includes 
righ t downmix channel 30. In P corre sponds 

encoded versions of the downmix channels, which corr P 
to the case, in which the audio encoder 16 in Fig. 

e sent, the data stream reader 2, also includes , „ ; audio 
Loder, which is adapted to the audio -coder used f r n 
coding the downmix channels. In this case, the udio 
coder , which is part of the data stream reader 2,1 P 

t*. the first downmix channel Lc ana 
erative to generate the first 

, „ stated more exactly, a ae 

second downmix channel Rc, or, stated m .„.„„. 

ded version of those channels. For ease cf description 
.Unction between signals and decoded versions thereof is 
only made where explicitly stated. 

The channel side information 26 and the left and right 
Tonmix channels 23 and 30 output by the data stream re r 
« are fed into a multi-channel ^st.ctor , ^ 

— * - c : s " uct b i d P ;i;i: ; Lis u . — ^ 

player 36. In case multl -ohannel player 

erative in the freguency domain, tne 

36 will receive freguency domain input data, which have to 
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plaY er 36 may also include decoding facilities. 

, aTn reader 24, which only outputs the lett 
ha ve the data stream reader ^ ^ 

and right downmix channels 28 and 

constructed versions 34 of the g 
multi-channel reconstructor 32. 

v. ,' m0 nt of the inventive calculator 14 
Fig. 3A shows an embodiment of the 

^ nfher hand operate on different v 

utilization of elements. 

30 • via 3A is operative for receiving two 

The device shown m Fig. 3A P operat ive to 

i= a R The device shown m Fig. 3A is op 
channels A, B. ine a 

calculate a side information for channel B such 
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fnr the selected original 
«i Qide information for i:ne 
T S a! , l- t ea version of channel B can b e cal- 
channel B. a Addit ionally, the device 

culated to £orm ^cy 

shown m Fig. iA is f =mo ^ rs for weighting 

channel side information, such as P a a t fo ^ 
(by multiplying or time processing as i 

ortral values or subband samples. To this en 
spectra valu ^ tlms/£requency con- 

tive calculator nclu representat lon o £ 

r::: nr« - - • — «-» r — 

ration o£ channel B at an output 140c. 

T the preferred embodiment, the side information determi- 
I„ the preferr informa tion determination 

nation (by means of the SDect ral values. 

140fl is performed using quantized spectra 

controlled using a P-^~ [ U 

- StiC nTregX: » - - side information determina- 

channel A for determining the channel side 
channel B. 

cnlated by -ans of a egu representatio „ of 

the channel * and e f eg ^ cQnversion 
the channel B, the wi ilt erbank-based 
means 140a c an be the same a - - ^ 
audio encoder, in this c . ^ ^ mdct fUter 

considered, means 1 «°» » * tMns£orm) with 50% 

bank (MDCT = modified discrete 
overlap-and-add functionality. 
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In such a case, the quantizer 140d is an iterative quan- 

generated. The f requenc y domain representation . o**» 
.! A which is preferably already quantized can then be 
lly used for entropy encoding using an entropy enood 

140 q, which may be a Huffman based encoder or an 

encoder implementing arithmetic encodmg. 

j _ Fia ! the output of the device in Fig. 
Wh en compared to Fig. 1, t P ^ 

3A is the side information such as 1, 

f „ the side information for B at the 
channel (corresponding to the sioe 

f device 140f ) . The entropy encoded bitstream for 
output of device 140 > ^ ^ 

channel A corresponds to e. g. 

channel LC at the output of block 16 in Fig. 1- » ™ 
3A it becomes clear that element 14 (Fig. D . x ... t 

iculator for calculating the channel side 
the audio encoder 1. (Fig X, can - - ^ ^ 
ra te means or can be implemented as a 

that both devices share several elements such as the 
tie ban, 140a, the quantizer 140e and the entropy en- 
r i g. morally, in case one needs a different trans- 
for r etc for determining the channel side informat on 
Z the encoder 16 and the calculator 14 (Fig. 1, will be 
^lamented in different devices such that both elements do 
; not share the filter bank etc. 

Generally, the actual determinator for calculating the side 
olatiln (or generally stated the calculator 14, may be 
cemented as a Joint stereo module as shown » F „ B 

o — — -vLir:: s ::ri::;: rural c Ue 

techniques such as intensity 
coding . 
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the ^U. determination means 140f*- ^ 
calculate the combined channel. The 
carrier channel, a, one can - 

" Ch3 ch as Lc + R c. Therefore, the invents device 
"l S i; h s to calonlate the scaling intonation tot 

us ing the scaling information or, as one can say, 
tensity directional information. 

—„:;,::::;.— — - r rr~ 
— « — — 

side information such that, using 

and the ioint stereo parameters, an approbation 
orig inal selected channel B can be calculated. 

, „ hB , oint stereo module 140f can be imple- 
25 Alternatively, the joint srer 

m ented for performing binaural cue codrng. 

X„ the case of BCC, the ioint stereo module 
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tive to output the cnaim«x — - 

— 1 side information e^ded T^or 

ictd Tni'tuaTr: ri;;:: — - — - - 

reirivrrnmi: -nnel used for calculate the side in- 
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fo rmation, such as the first, «- seconc , « . £ 

reference channel in the sense 
technique. 

• to Fig 4, a simple energy-directed implements- 

irrr.,::.:;',.; , »• :~::„; r;: 

channel A and a calculated by 

v^+-h freauency bands, an eneiyy 
Then, m both freque y h _ dg _ 

means of an energy caloulator 42 ^ de _ 
tall ed implementation of the -er.y calculator 
P end on whether the output »^ f ~ ^ ^ imple - 

ba nd signai or are fluency coe rcrent, ^ ^ ^ 

mentations, where scale factors ^ 

^ airpadv use scale tactoib 

calculated, one can already u ^ 

least as estimates of the energy In a ga- ^_ 
la ting device 44. a gein factor g B for «» ^ 
20 quency hand is determined basec « ~ rtarn^ ^ ^ ^ 

93ln ™rr"or g can directly he used for 

4- Here, the gam £reaue ncy coefficients 

weighting time domain samples or regue y ^ ^ 

such a s will he described ate rn Fr. ^ ^ 

rrllente: by the parametric channel side information as 
30 calculated by the calculator 14 in Fig- 1- 

it is not necessary to transmit 
It is to be noted here that rt n 
gain values as channel side informatron. It 
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ci ent to trans.it fluency dependent values related to the 
nf the selected original channel. Then, the 
absolute energy of the seiec downm ix 

energy and the transmitted energy for channel B. 

Fia 5 shows a possible implementation o£ a decoder set up 
"election with a transform-baaed perceptual aud o en- 
code, cohered to Pig- 2. the ^ 
tropy decoder and inverse guantrzer 50 (Frg. 

, rir, 5 The functionality of the fre 
clu ded in bloc* 24 of Frg. 2. The 

q uency/time converting elements 52 a 5 b g^ ^ ^ 
however, he implemented rn rtem 36 of Frg ^_ 
F1 g. 5 receives an encoded versron of the frrs 

9 . , T _. or RC. At the output of element 50, 

°rr:::t s ;:r v-dd version of — and T 

i: I channel is present which is subsegueny 

led channel A. Channel A is input into a freguency hand 
lector 54 for selecting a certain freguency band from 
channel A This selected freguency band is weighted using a 
In li P it 56. The multiplier 56 receives, for multip yrng 
multxpxiej- ~>w selected 

* a ^r,r a n which is assigned to tne sei^o 
a certain gain factor g B , wm 54 

, 3nd se lected by the frequency band selector b4, 
frequency band seiecteu y 

j tn th e frequency band selector 40 in tig- 

oonverter 52a, there exists, together with other band - a 
£r eguency domain representation of channel A. At the P 
of multiplier 56 and, in particular, at 

qU enc y /time conversron means 52b there & 
str ucted freguency domain represent atron c^nne 
Therefore, at the output of element 52a, there 
time domain representation for channel A, whrle, 
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ml be a time domain repre- 
output of element 52b, there will 
sentatlon of reconstructed channel B. 

It , t0 be no ted here that, depending the certain im- 
itation, ^^^zz:^. - - • 

played back rn a multr chan ^ 
mu l ti -channel enhanced decoder, the -co ^_ 
n6 ls are only used for reconstruct^ the o g 

„els. The decoded downmix channels are only repl 

lower scale stereo-only decoders. 

H= tn Fia 9, which shows the 
To t his end, reference rs mad ■ ^ ^ & _ 

referred ^ hanced surround bitstream 

r ound/mp3 envrronment. An QUtputs de . 

ls input int o a ---^rr—, channels. These 
coded versrons of the or g ^ Qf 

downmix channels can then be ^ J e two channels are 
. low level decoder. »<~^L -coding device 32 

inpu t into «— <^ - J!, ext ension data, which 

which also recerves the multr g ^ 

are preferably input into the ancrllary 

compliant bitstream. 

j= fn Fia. 7 showing the 

~'TJtrr./:.:- - - — ■ 
- — rtr— r. 
— • - ~~ -'•«*• - — » 

to channel A m Fig. 3A, 3B, figure s. In 

the middle corresponds to side in - 

the left column m Frg. 7. the * the Fig . 
formation is explicitly stated. In .ceo 

7 tab le, the channel side information U for ^ 
left channel L is calculated using the 
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, lpft surr0 und channel side information Is, is de 
Lc . The left surro ^ 

termined by means of the ong carrie r. 

i T5 an d the left downmix channel Lc is 

channel Ls and tne ie original 

■nht channel side information n for the 

The right channel downmix 

right channel B are detained «™« for 

„i *, ^-n„ ;; e el d rained uslng th e 

th. right surround channel Rs are 

right — = r—U'ara deter- 

nel 3ide informatron ^ ^ ^ ^ obtalned 

" rirna. - ~ ^ ^ 

:rr a rr- - — - - 

bits for transmission. 

mlx channel or chan . 
a weighted addition as ^ weighting 

:0 nel5 suoh as 0.7 Lc an ^ ^ ^ 

parameters are Known to a decod prefe rred 
ingly . F or most applications how. ^ ^ ^ ^ 
to only derive channel srde 

channel from the combined downmiK channel r.e 
clination of the first and second down,!, channels. 

To sho w the bit savin, potential of the present invention, 
the following typical example is grven. In case 
channel audio signal, a normal encoder ^ ^ 
64 tt.it/. for each channel amountrng " an vera! 
of 320 ,bit,s for the five channel srgnal T he 

■„hf stereo signals require a bit rate o 
Tannels Irde .formation for one channel are between 1, 
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and 2 Kbit/.. Thus, even in a case, in which channel ade 
information for each of the five channels are transmit , 
thi. additional data add up to only 7.5 to 10 Mort/s Thus 
the inventive concept allows transmission of a five channel 
5 audio signal using a bit rate of 138 xbit/s (compared to 
320 «„ Kbit/., with good quality, since the decoder does 
»„t use the problematic dematrixing operation. Probably 
even mere important is the fact that the inventrve concept 
is fully backward compatible, since each of the exrstrng 
10 m P 3 players is able to replay the first downmix channel and 
1 secld downmix channel to produce a conventional stereo 
output . 

pending on the application environment, the inventive 
15 m ethod for processing or inverse processing can be rmple- 
m ented in hardware or in software. The implementation can 
be a digital storage medium such as a disx or a CO havrng 
electronically readable contro! signals, which can cooper- 
ate with a programmable computer system such that the in- 
centive method for processing or inverse processing rs car- 
ried out. Generally stated, the invention therefore, also 
relates to a computer program product having a program code 
stored on a machine-readable carrier, the program code be- 
in, adapted for performing the inventive method, when the 
25 computer program product runs on a computer. In other 
words, the invention, therefore, also relates to a computer 
program having a program code for performing the method, 
when the computer program runs on a computer. 
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